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Abstract 

Earlier, we introduced a direct method called fixation for the recovery 
of shape and motion in the general case. The method uses neither feature 
correspondence nor optical flow. Instead, it directly employs the spatio- 
temporal gradients of image brightnesses. 

This work reports the experimental results of applying some of our fixa¬ 
tion algorithms to a sequence of real images where the motion is a combina¬ 
tion of translation and rotation. These results show that parameters such as 
the fixation patch size have crucial effects on the estimation of some motion 
parameters. 

Some of the critical issues involved in the implementation of our au¬ 
tonomous motion vision system are also discussed here. Among those are 
the criteria for automatic choice of an optimum size for the fixation patch, 
and an appropriate location for the fixation point which result in good esti¬ 
mates for important motion parameters. 

Finally, a calibration method is described for identifying the real location 
of the rotation axis in imaging systems. 
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1 Introduction 

Recovery of relative motion between an observer and an environment as well as the structure 
of the environment, from time varying images, is the goal in motion vision. Much of the 
earlier work on recovering motion has been based either on establishing correspondences 
between the prominent features in the images of a sequence ( correspondence ) or establishing 
the velocity of points over the whole image, commonly referred to as the optical flow . 

In general, identifying features here means determining gray-level corners. For images of 
smooth objects, it is difficult to find good features or corners. Furthermore, the correspon¬ 
dence problem has to be solved, that is, feature points from consecutive frames have to be 
matched. On the other hand, the computation of the local flow field exploits a constraint 
equation between the local brightness changes and the two components of the optical flow. 
This only gives the components of flow in the direction of the brightness gradient. To com¬ 
pute the full flow field, one needs additional constraints such as the heuristic assumption 
that the flow field is locally smooth [5, 4]. This leads to an estimated optical flow field which 
may not be the same as the true motion field. 

The use of optical flow or correspondence techniques for solving motion vision problems 
has proven to be rather unreliable and computationally very expensive [16, 15, 7]. This has 
motivated the investigation of direct methods which use the image brightness information 
directly to recover the motion and shape. 

Previous work in direct motion vision has used the Brightness-Change Constraint Equa¬ 
tion (BCCE) for rigid body motion [8] 
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to solve special cases such as known depth [5], pure translation or known rotation [6], pure 
rotation [6], and planar world [8]. All these direct methods are restricted in the types of the 
motion or shape that they can handle. 

Recently, we introduced a direct method called fixation 1 for solving the motion vi¬ 
sion problem in the general case without placing restrictions on the motion or the shape 
[12, 13, 11]. The fixation method is based on the theoretical proof that for a sequence of 
fixated images (a sequence of images with one stationary image point in them), the 3D ro¬ 
tational velocity w can always be explicitly expressed in terms of a linear function of the 3D 
translational velocity t. Namely, 


1 

“ = WR„Ro + p-j|(t xR 0 ) (2) 

A 

where R 0 is the unit vector along the position vector of the fixation point (a point in the 
image plane which stays stationary) and is the component of rotational velocity about 
the fixation axis R 0 . 

It should be emphasized that we do not need to know the real fixation point, if there is any, 
to take advantage of this fixation constraint equation (FCE), eqn. (2). In fact, our algorithm 

1 The terms and notations used in this paper have been defined in our previous work such as [10] or [12]. 
For a review of the necessary background, the reader is encouraged to consult one of those references. 
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allows us to choose virtually any point as the fixation point by a simple manipulation of one 
of the images and obtain a sequence of fixated images [12, 13]. 

The combination of the Fixation Constraint Equation (FCE), eqn. (2), and the BCCE, 
eqn. (1) offers a solution to the motion vision problem of arbitrary motion relative to an 
arbitrary rigid environment. That is, it allows recovery of the depth map Z , total 3D 
rotational velocity, and 3D translational velocity t without putting severe restrictions on the 
motion or the shape [12, 13]. 

Fixation does not necessarily mean tracking! Our technique for obtaining fixated images 
is not only simpler than the previous tracking methods, but also is more general. For 
example, Aloimonos & Tsakiris [2] propose a method for tracking a target of known shape; 
Bandopadhay et al. [3] use optical flow and feature correspondence for tracking the principal 
point in order to find the motion in a special case (They assume that there is no rotation along 
the optical axis.) without considering noise; and Sandini &; Tistarelli [9] use an optical flow 
based tracking method for finding the depth in a special case (no rotation along the optical 
axis). Also, Thompson [14] introduces an optical flow method for recovering the motion in 
special case where the rotational velocity along the optical axis is zero. His method requires 
a sequence of tracked images at the principal point but he acknowledges that the actual 
implementation of such tracking requirement in engineering systems is not possible yet. 

On the other hand, our fixation method does not require tracked images as its input. 
Instead, it introduces a pixel shifting process which constructs a sequence of fixated images 
at any arbitrary point, chosen as fixation point , and for any input sequence of images [12, 13]. 
This is done entirely in software without physically moving the camera for tracking. Besides 
being reliable, our pixel shifting process is much simpler than those tracking methods. 

This work reports the experimental results of applying some of the fixation algorithms to 
real image sequences where the motion is a combination of translation and rotation. Finding 
the fixation velocity (velocity at the fixation point) and the component of rotational velocity 
about the fixation axis, u>r_ o , are important steps in our fixation method [12, 13]. The 
results here show that the fixation velocity and u;r o can be estimated satisfactorily if proper 
parameters values are used. 

Some of the crucial implementation issues of our fixation technique are also discussed 
here. Among those are the autonomous selection of an optimum size for the fixation patch 
(the image patch around the fixation point) based on an error norm ( normalized error), and 
the choice of an appropriate location for the fixation point. 

And finally, a calibration method is described for identifying the real location of the 
rotation axis in imaging systems. 


2 The Effect of Fixation Patch Size 

Finding the fixation velocity (velocity at the fixation point), and the component of rotational 
velocity about the fixation axis, o;r 0 , is an important step in our fixation method for recov¬ 
ering the shape and motion from an arbitrary sequence of input images. This is because 
in our method a pixel shifting process uses the fixation velocity to construct a sequence of 
fixated images from an arbitrary sequence of input images. We also need u>r o for computing 
the total rotational velocity [12]. 
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The algorithms used for recovering the fixation velocity and u>r o obtain their input infor¬ 
mation from the fixation patch (an image patch around the fixation point) [12, 13]. In order 
to study the effect of the fixation patch size on the estimation of the desired motion parame¬ 
ters, we have used a sequence of real images acquired at the Imaging Laboratory of Carnegie 
Mellon University. Figures 1 and 2 show two of these 16 bits grey levels, 576 x 384 pixel 
images. The camera has a nominal focal length of 24 mm, and a pixel size of 0.02 x 0.02 mm. 
The calibrated principal point has been used as a fixation point. In the raster format system 
(origin at the top left corner of the image), the principal point is located near the center of 
image, pixel (275,205). The frontal depth of this point is about 1450 mm. 



Figure 1: The First image in the landscape image sequence. 

The real motion between these two images has both translational and rotational compo¬ 
nents. The real rotation is —0.3 degree about the optical axis Z . The real translation is —2 
mm along the horizontal axis X. Testing our algorithms using such real images is valuable 
because the observed motion is relatively large (more than subpixel motion in the image 
plane). For very large motions it is enough to use higher frame grabbing rates. These days, 
there are commercially available frame grabbers which are capable of capturing up to 7,500 
frame per second at 12-bit gray scale resolution on personal computers [1]. 

2.1 Estimation of rotational velocity component, u?r o 

The motion field velocity due to the component of the rotational velocity of an observer 
relative to an environment along R 0 is given by — (u;r o x r) = — u> Ro (R 0 xr)= _ ]|^( r o x r), 

where R 0 = r c is the unit vector along r 0 , position vector of the fixation point in a viewer 
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Figure 2: The second image in the landscape image sequence after undergoing a real motion of 
—0.3 degree rotation about the nominal optical axis Z, and -2 mm translation along the horizontal 
axis X. 

centered coordinate system. Assuming that depth is approximately the same on the fixation 
patch (a small patch around the fixation point) and substituting for r 0 = ( x 0 y Q 1) T and 
r = (x y l) r , we can write the components of the total motion field 
velocity and u;r o as 

x ‘ = u ° ~ S* ' ( r ° X r ) = ~ 

V, = v 0 - • (r 0 x r) = w„-u>r 0 (x- 

where x and y are the unit vectors along the x and y axes and u>r o 
I deally, the BCCE must be satisfied at any point on the fixation 

x t E x T ytE y -f E t = 0. (4) 

Substituting for x t and y t from eqn. (3) into the BCCE, eqn. (4), gives 

[«o + UTL 0 {y - Vo)\E x + [u 0 - ^R 0 (x - x 0 )]Ey + E t = 0. (5) 

Due to noise, eqn. (5) does not necessarily hold for any pixel (x,y) so we can find u Q ,v 0 and 
o>r o by minimizing the sum of squares of errors over the fixation patch. In other words we 
want to minimize 

JI [(u 0 + o>r o (y - y 0 ))E x + (v 0 - u? Ro (x - x 0 ))E y + E t ] 2 dx dy (6) 


velocity due to fixation 
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with respect to u 0 , v 0 and u?r o . This results in a system of three linear equations that can 
be solved for the three unknowns 
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Matrix A is symmetric 


and its elements are given by 



<*12 = JJ E x E y dx dy 

<*13 JJ E x [E x (y y 0 ) Eyi^x x 0 J\dx dy 
<*23 = JJ E y [E x (y - y 0 ) - E y (x - x Q )\dx dy 
a ai = JJE 2 x dxdy 
<*22 = JJ Eydx dy 

. <*33 = JJ[E x (y - Vo) - E y (x - x 0 )] 2 dx dy 


and the of components of vector C are as follows: 




( = -JJ E t E x dxdy 

\ c 2 = -JJ E t Eydxdy (9) 

[ c 3 = -JJ E t [E x (y — y Q ) — E y (x - x 0 )]dx dy. 

Considering that the fixation point coordinates x Q and y Q are known, then the sets of equa¬ 
tions in (8) and (9) show that the elements of matrix A and the components of vector C are 
fully computable. After finding u>r o , we can easily calculate u;r o as 

W R 0 = ^R 0 \Ao + 2/o + 1- (10) 

In the special case where the fixation point is at the principal point, x 0 = y 0 = 0, elements 
of matrix A and the components of the vector C are simplified further and u;r o becomes 
equal to u>r o . 

Using these algorithms, we can find u>r o for any given fixation patch size. Figure 3 shows 
that for small patch sizes (less than 30 X 30 pixel in this case) the estimated value for u;r 0 
is oscillating wildly and results in unacceptable values. As the patch size increases, the 
estimated u>r o converges towards the real value of rotation. For large patch sizes (around 
100 x 100 pixel in this case) the estimated rotation, —0.309 degree , becomes roughly the 
same as the real rotation, —0.3 degree. 

It can be seen that the size of fixation patch has a critical effect on the estimated values of 
the component of rotational velocity about the fixation axis, u;r 0 . A small patch size results 
in a value for u?r o which is usually far distant from the real value. This is possibly because 
in a small patch, small translations can be interpreted as large rotations. Figure 4 shows a 
hypothetical situation where (a) and (6) are a sequence of a small 3x3 pixel patch. The real 
motion in this case is most likely a pixel heigh vertical translation. But if we try to interpret 
it as a rotation about the patch center we will end up with a 45 degree of rotation which is 
not acceptable, considering the assumed small motion between images. As a conclusion, we 
should use relatively large patch sizes in order to obtain good estimates for the rotational 
velocity component about the fixation axis, u>r o . 
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Figure 3: Estimated value of the component of rotation velocity about the fixation axis, ojr , versus 
the fixation patch size for the landscape image sequence. For large patch sizes, the estimated value 
of u>r o converges towards the real value of u?r o , -0.3 degree. 

3 Autonomous Choice of Optimum Fixation Patch 
Size 

The experimental results and explanations in the previous section suggest that relatively 
large patch sizes should be used in order to get a good estimate for the component of the 
rotation along the fixation axis, u>r 0 - On the other hand, we know that in general a large 
patch size will result in a wrong estimate for the fixation velocity because depth variations 
generally increase as the patch size increases. In this section, we will describe a technique 
for choosing an optimum fixation patch size which results in a good estimate for the fixation 
velocity. 
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Figure 4: Using small fixation patch can result in wrong interpretation of large rotation. In a 
patch of 3 X 3 pixel, a pixel heigh vertical translation can be seen as 45 degree rotation which is 
not an acceptable answer at all, considering the finite motion between images. 


3.1 Computing the fixation velocity 


We can find a good estimate for wr o using a relatively large patch but the corresponding 
fixation velocity estimate from such large patches is not usually reliable. Using only the 
acquired estimate for u> Ro from a large patch, we can write the total motion field at any 
point ( x,y ) on a small patch around the fixation point (fixation patch). As we showed in 
subsection 2.1 
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where (x 0 , y 0 ) is the position of fixation point (located in the image plane), and (u 0 ,v 0 ) is 
the fixation velocity that we are about to estimate. After substituting for x t and y t into the 
BCCE, eqn. (4), we will have 



W R C 


\]*l + 2/ 0 2 + 1 


(y -y Q ) 
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E y -f E t — 0. 


( 12 ) 


However, due to noise, the above equation does not necessarily hold for any pixel. As a 
result, we can find u a and v 0 by minimizing the sum of the errors over the whole fixation 
patch. Namely, by minimizing 


JL 
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with respect to u 0 and v 0 . This will result in the following system of linear equations 


dx dy 
(13) 
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Ifp E x E y dx dy ff p Eydx dy 

Sf P ( ^/ x ;^ +1 ((* _ x °) E y - (y ~ V° )Ex) - £») E x dx dy 
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that can be solved for the two unknowns u 0 , and v 0 . Note that u>r o has been already 
computed and is a known value in this equation. 

Figure 5 shows the estimated values of the horizontal translation U = ^r- for the land- 
scape image sequence for different sizes of the fixation patch where / is the focal length and 
Z 0 is depth at the fixation point. It can be seen that U nicely converges towards the real 
horizontal translation, —2 mm. The dependency of U on the patch size is quite clear in this 
figure. 

In practice, we do not know the real fixation velocity, and therefore we cannot select an 
appropriate fixation patch size by checking the computed values of fixation velocity. In order 
to solve this problem, we should find an autonomous way of choosing an optimum size for 
the fixation patch. 

3.2 Normalized error 

We showed that for any given size of the fixation patch, we can find the fixation velocity 
components, u 0 and v Q . Also the component of the rotational velocity about the fixation 
axis* W R 0 * can be estimated using a relatively large patch. Knowing these values, the motion 
field velocity (x t ,y t ) at any point (x,y) in the image plane is given by eqn. (11). Ideally, for 
any given image point (x,y) the BCCE, eqn.( 4), must be satisfied. However, in practice we 
are dealing with real images which are noisy and as a result, the term x t E x + y t E y + E t does 
not usually become zero. This term can be considered as an error term for the corresponding 
pixel. In a patch of size p x p pixel, we can add these error terms to define the normalized 
error as 

H[xtE x + ytEy + E t ] 2 

e =-^-• (15) 

This definition allows us to compare the performance of different patch sizes by studying 
the behavior of the normalized error e with respect to the changes in the patch size p. This 
consideration makes it possible for us to find an optimum patch size. 

3.3 Case I: Small changes in relative depth as p increases 

Figure 6 shows the normalized error versus the fixation patch size for the landscape image 
sequence. Although this plot corresponds to a specific image and motion, it shows one of 
the two typical representations of the normalized error behavior as the patch size increases. 
As shown in this figure, the normalized error first increases with the patch size and reaches 
a peak and then dips down. 
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Figure 5: Estimated value of the horizontal component of translational velocity, along the X-axis, 
versus the fixation patch size for the landscape image sequence. 

This is because initially for the smallest patch size (3x3 pixel) the algorithm finds the 
motion estimates that makes the BCCE error term ( x^E x -f- y^Ey -f- E $) as small as possible. 
In a 3 x 3 pixel patch, there is only one BCCE error term which corresponds to the central 
pixel of the patch. The algorithm does a good job in minimizing this error term but the 
motion estimates are usually very bad at this level because basically there is not enough 
data available to the algorithm. 

In the next level, we have a patch of 5 x 5 pixel size which includes 9 different permutations 
of the basic block of 3 x 3 pixel patch. There is still not enough data for the algorithm to 
come up with good motion estimates but it finds parameters which minimize the the sum 
of the BCCE error terms. Usually, the algorithm is not as successful as it was for the 3 x 3 
pixel patch size because it should deal with 9 error terms instead of one and this will result 
in higher normalized error. 

As we increase the patch size, the struggle between providing more data to the algorithm 
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Figure 6: Estimated value of the normalized error e versus the fixation patch size for the landscape 
image sequence. 


and satisfying more error terms continues and for relatively small patch sizes results in higher 
normalized error. The normalized error increases until it reaches a peak point where the rule 
of more input data becomes more important than satisfying more error terms. Then by 
increasing the patch size, we are providing more useful data to the algorithm and this will 
give a better motion estimate and results in a smaller normalized error. 

After dipping down, the normalized error stays roughly the same in this case because the 
relative depth variation does not change much with the patch size, (Fig. 6). The optimum 
patch size in this example occurs around 100 x 100 pixel which corresponds to the start of 
small normalized error slope (roughly flat) after the first peak. In this example, relative 
depth changes are small (1250 mm to 1625 mm, about 30% difference) and stay roughly the 
same as the patch size increases. 
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3.4 Case II: Significant changes in relative depth as p increases 

In this section we will study another image sequence where there is considerable relative 
depth changes as we increase the patch size. The purpose is to see how the normalized error 
behaves in this case. Figures 7 and 8 show a sequence of two 227 x 280 pixel, 32-bit images 
(cup images). The real motion of the camera is a horizontal translation of 2.5 mm to the 



Figure 7: The first image in the cup image sequence. 

right. The camera has a nominal focal length of 18.66 mm, pixel-width of 0.032 mm, and 
pixel height of 0.029 mm. We have used the nominal principal point (image center) as our 
fixation point. 

Figure 9 shows the estimates for the horizontal translation, vertical translation, and the 
rotational velocity component u>h. 0 which are obtained using the same algorithms used for 
the landscape image sequence. It is obvious that the estimated values depend strongly on the 
size of the fixation patch. However, we can find good estimates for these motion parameters 
if we choose the right fixation patch size. 

The normalized error for this sequence of cup images is shown in Fig. 10. As before, 
the normalized error first increases and after reaching a peak it dips down and then grows 
with the patch size again. This is because in the beginning, insufficient information results 
in extremely wrong estimates specially for the rotational component and this causes the 
normalized error to increase with the patch size. As we are providing more and more data 
to the algorithm, we obtain better estimates for the motion components and this decreases 
the normalized error. If we increase the patch size beyond an optimum patch size, which 
occurs at about 50 pixel in this example, the normalized error starts increasing again. In this 
50 x 50 pixel patch, we have a considerable amount of relative depth change (from 584 mm 
to 914 mm, about 60 % increase). Such significant relative depth variation leads to wrong 
fixation velocity estimates which in turn results in a larger normalized error. 
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Figure 8: The second image in the cup image sequence after 2.5 mm horizontal motion of 
the camera to the right. 

As one might expect, the optimum fixation patch size depends on the patch topology and 
texture which may vary not only from image to image but also from patch to patch in a single 
image. However, the general pattern of the normalized error allows us to autonomously find 
an optimum fixation patch size which gives good estimates for the fixation velocity com¬ 
ponents. This optimum fixation patch size corresponds to either the minimum normalized 
error after the first peak (in case where there is considerable change in the relative depth, 
as in the cup image sequence), or where there the normalized error starts changing slowly 
with the patch size (in the case where the relative depth does not change significantly with 
the patch size, as in the landscape image sequence). 

4 Autonomous Choice of an Appropriate Fixation Point 

In general, our fixation algorithms do not put any restrictions on the choice of the fixation 
point location and virtually any point can be chosen as the fixation point. Among all points, 
the choice of principal point (image center) makes the formulations simpler. However, in 
practice, one should take some measures in choosing an appropriate fixation point. Most 
significantly, the motion of the chosen fixation point should be detectable using the infor¬ 
mation from its corresponding patch. To clarify this, we can consider a patch which has a 
uniform brightness. Choosing the center of a such patch as the fixation point will not be 
useful. Because the motion of such point is irrecoverable using only the information from 
that patch. 

Similar to 3.1 (with the exception that u;r 0 = 0 here), the least square method can be 
applied to the BCCE terms to obtain the following system of linear equations for the uniform 
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Figure 9: The estimated values for the horizontal translation, vertical translation, and the 
rotational velocity component, u>r o , versus fixation patch size in the cup image sequence. 

motion field (u, v) on the patch as 

' SSpEldxdy ff p E x E p dx dy 1 / u ( -ff p E t E z dx dy \ 

. SfpE x E y dxdy fJ p Eldxdy J \ / \ -ff p E t E y dx dy ) ' (1S) 

It is obvious that the solution for (u, u) exists if the determinant of the above matrix 

D = ^Up E * dx dy ^Up E * dx dy ^ ~ (ff ExE y dx ^) 2 (17) 

is not zero. But this is still not enough because it does not guarantee that the patch is an 
appropriate one. 

If we denote the smaller eigenvalue of the coefficient matrix in eqn. (16) with A a , 

= I [SI P (El + El)dx dy - JH P {EI - ED'dx dy + 4(//„ E x E y dx dy ) 2 1 (18) 
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Figure 10: The normalized error versus fixation patch size for the cup image sequence. 

then we can define a good fixation point as a point whose corresponding patch has the 
largest A a . Using such patch not only guarantees a solution (D ^ 0) but also ensures that 
our solution (u,v) is not sensitive to noise errors in the coefficient matrix of eqn. (16). It is 
simple to implement this criteria for autonomous choice of a good fixation point even in real 
noisy images. 

We have addressed the question of finding an appropriate fixation point (the center of 
a fixation patch) among a number of given patches. But which patches should we check in 
the first place? We can search the whole image for a globally optimum location of a fixation 
point as follows: 

1- Divide the whole image into 4 quadrants and find the corresponding A, for each 
quadrant. 

2- Use the quadrant with the largest \ 9 as a new base image. 

3- Repeat steps 1 h 2 until reaching a quadrant with an acceptable size. 

Doing such comprehensive search may not always be necessary. Instead, we can check a 
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limited number of neighboring patches (near the principal point, for convenience) and choose 
the center of the one with the largest \ s as the fixation point. 


5 Calibration of the Rotation Axis 


In our experiment on the landscape images, we have not explicitly applied any vertical 
translation (along Y axis). However, the experimental results in Fig. 11 show a vertical 
translation of about -0.9 mm. This is mainly because the real rotation axis does not pass 
through the center of projection 2 . To clarify this, we should mention that in motion vision, it 
is assumed that the rotation axis passes through the origin of the viewer centered coordinate 
system, i.e the center of projection. But at the CMU Imaging Laboratory, the rotation 
mechanism is not set up so as to make the Z axis of rotation coincide with the optical axis. 
To obtain this experimental result, we have used fixation algorithms which assume that the 
rotation axis passes through the center of projection which is not true here. 

According to the basic kinematics, the compensating translation which results from shift¬ 
ing the rotation axis is given by 

V 0 = -u) x B. ( 19 ) 

Where B is a vector extending from a point on the real rotation axis to a point on the shifted 
rotation axis. In our special case, V 0 = -(uz) x ( bx ). In this experiment, V G = -0.9y mm, 
and u ; = -0.3 degree. As a result we conclude that the real rotation axis is located at about 
b = —(—0-9)/((—0-3 x 7r)/180) = —172 mm perpendicular distance from the optical axis in 
the horizontal plane. 

A similar method can be used for the calibration of the rotation axis which is parallel to 
the optical axis in any camera system arrangement. In order to find the real location of the 
rotation axis, the following steps should be taken: 

1- Apply a pure rotation about the axis which is supposed to be the optical axis. 

2- If u>r o is not accurately known, compute it by applying the algorithms given in section 
5 of [12] to a relatively large patch around the principal point. 

3- Find the translational motion (u 0 ,v 0 ) at the principal point using eqn. (14). 

4- Find the real location of the rotation axis using, 


b x = 

by = 


' oU, R 0 

u °f 


' oa, R 0 



where Z Q is depth at the principal point, and / is the focal length of the camera. As a result, 
the real rotation axis is parallel to the optical axis and intersects the image plane at point 
( b x t by). 

2 If the CCD edges are not accurately aligned with the horizontal and vertical axes of the camera frame, 
i.e. the CCD is mounted at an angle with respect to the camera coordinate system, such kind of errors 
happen in both vertical and horizontal directions. But it is not the case here because the inaccuracy of 
motion has happened only in the vertical direction. 
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Figure 11: Estimated value of the vertical component of translational velocity, along the Y axis, 
versus the size of fixation patch for the landscape images. 

6 Conclusions 

Recovery of fixation velocity and the component of the rotational velocity along the fixation 
axis? W R 0 ? are important important steps in our fixation method. The experimental results 
presented here show that the fixation velocity and u? Ro can be computed satisfactorily using 
only the information from a small patch around the fixation point. The corresponding 
optimum patch sizes in these experiments is equivalent to a field of view of about 2 x 2.4 
degree. Obtaining such good motion estimates while using only a small field of view ensures 
the feasibility of our fixation method. This is especially important if we consider that the 
nominal (not calibrated) focal length and pixel size are used in the computations. 

The presented techniques for the autonomous choice of an appropriate fixation point, and 
an optimum fixation patch size allows us to find good estimates for the motion parameters. 
Also, the method described for the calibration of the real rotation axis offers a simple solution 
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to an important practical problem. This problem can result in considerable error in the 
motion estimates if it is not detected and compensated for. 

Our goal has been to design a general motion vision system which takes any sequence of 
images as its input and recovers the motion and shape without any need to check, choose, 
and adjust parameters. Our fixation technique offers such a general system and this paper 
answers the critical issues involved in the full implementation of such an autonomous motion 
vision system. 
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